Statistical Science 

2009, Vol. 24, No. 2, 191-194 

DOI: 10.1214/09-STS284REJ 

Main article DOI: 10. 1214/09-STS284 

© Institute of Mathematical Statistics, 2009 



Rejoinder: Harold Jeffreys's Theory of 
Probability Revisited 

Christian P. Robert, Nicolas Chopin and Judith Rousseau 



O 

(N 

C 

oo 



Abstract. We are grateful to all discussants of our re- visitation for 
their strong support in our enterprise and for their overall agreement 
with our perspective. Further discussions with them and other leading 
statisticians showed that the legacy of Theory of Probability is alive 
and lasting. 
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1. ON BERNARDO'S COMMENTS 

We cannot but agree with most issues raised by 
Professor Bernardo, first and foremost the impor- 
tant distinction between testing and estimation. The 
multidimensional Jeffreys prior (for estimation) is 
certainly not formally defined within Theory of Prob- 
ability and the multiplication of cases in the book 
does not help. We alas have no clear explanation as 
to why most Jeffreys priors produce proper posteri- 
ors for all datasets. While Lindley- Jeffreys's para- 
dox may be upsetting (although it mostly highlights 
the discrepancy between the frequentist and the 
Bayesian answers) we, however, consider the attempt 
to create a testing Jeffreys prior in Section 5.2 as an 
interesting if incomplete attempt, concretized much 
later by Bayarri and Garcia-Donato (2007). We un- 
derstand Professor Bernardo's point of view on Bayes 
factors, but still resist the temptation to throw away 
this useful tool, as discussed below in conjunction 
with Professor Lindley's comments. We are nonethe- 
less sympathetic to the intrinsic discrepancy mea- 
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sure as an invariant loss function, even though the 
necessity of scaling this measure and of selecting the 
subsequent bound between acceptance and rejection 
remain strong impediments against adopting this al- 
ternative. (The fact that it depends on the sample 
size n is certainly a major drawback, although one 
could reasonably object that the bound between ac- 
ceptance and rejection associated with the Bayes 
factor should also depend on the sample size.) 

2. ON GELMAN'S COMMENTS 

We first apologize to the authors of Gelman et al. 
(2001) for not ranking them into the "classics" of 
our first footnote. This choice was, however, delib- 
erate: we wanted to stop short of comparing the 
most recent textbooks of the late 1990s (excluding 
as well Robert, 1994). At a more foundational level, 
the debate about the choice of a noninformative or 
of a weakly informative prior is endless, hopeless 
and possibly fruitless, in that (a) there is no way a 
single perfect noninformative prior can be adopted 
by one and all except through a formal decision 
from the community to always use Jeffreys prior as 
a default (in the same way the Black-and-Scholes 
formula is used by financial analysts as a common 
ground, not as a representation of real series); and 
(b) noninformative and informative priors are not 
two well-separated categories, they form a contin- 
uum. It seems thus more fruitful to try to build mea- 
sures that assess of the impact of a given prior (or of 
the variation of a parameter in a family of priors). 
The debate about complexity is more in line with 
our views: (a) similar to the notion of a universal 
noninformative prior, a practical implementation of 
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the Ockham's razor does not exist; and (b) com- 
plexity is quite a subjective factor. This is not to 
say that we reject the Jeffreysian tenet that Bayes 
factors naturally downweight complex models with 
limited support from the data, since we support this 
point, but rather that the support for more complex 
models should come from the prior or from a loss 
function, rather than from the complexity-hungry 
likelihood. (The social scientist attitude that worries 
about missing some factor could also be questioned 
as being too optimistic in its belief in models.) Fi- 
nally, we concede that Bayesian data analysis may 
force us to move away from the "ideal" standards set 
by Harold Jeffreys's Theory of Probability, including 
the reliance on the Bayes factor. Bayesian model 
criticism, indeed a major direction in Gelman et al. 
(2001), is still in its infancy and would correctly re- 
quire more emphasis in our papers and in our prac- 
tice! As put by Professor Gelman, we need to learn 
more from "the failures of a statistical model's at- 
tempt to capture reality." 

3. ON KASS' COMMENTS 

We are grateful to Professor Kass for his com- 
ments that follow a talk given during the Harold 
Jeffreys's Theory of Probability anniversary session 
at the O-Bayes 2009 meeting. There is actually very 
little we can disagree with in these comments which 
show a deep and scholarly knowledge of Theory of 
Probability and expose our need to pursue our study 
of this profound book. 

The connection with geometry was bound to be 
part of Professor Kass' comments and we do agree 
with the essential feature of looking for orthogonal 
parameterisation, a point which, in our awkward 
phrasing, we would relate to the search for a con- 
stant information parameterization. We also appre- 
ciate the emphasis on Laplace's approximations that 
permeate the book and provide an early link with 
Bayesian asymptotics. The epistemological implica- 
tions of Theory of Probability are certainly worth 
stressing (a point also made by Professor Zellner) 
if only because Harold Jeffreys was first and fore- 
most a physicist who developed his own statistical 
tools to deal with his own physics problems. The 
specific points made by Professor Kass about the na- 
ture of statistical models would be worth emphasis- 
ing during any course in applied and even method- 
ological statistics (as are the central discussions by 
Erich Lehmann and David Cox in the 1990 volume 



of this journal). That Bayesian testing, or any kind 
of testing, remains a source for discussion and fur- 
ther research is clearly illustrated by the number of 
comments and the variety of proposals on this point. 
Finally, the lack of decision theory is an issue that we 
also deplore, in agreement with Professors Bernardo 
and Lindley as well, if not Professor Zellner. 

4. ON LINDLEY'S COMMENTS 

Besides so kindly contributing to the discussion 
therein, Professor Lindley patiently and helpfully 
enlightened us on the construction and contents of 
Theory of Probability during the preparation of the 
paper. We are therefore deeply indebted to him for 
sharing so much with us. His comments bring a 
unique perspective to the discussion, both from his- 
torical and foundational viewpoints. As a witness 
of the early developments of Theory of Probabil- 
ity, Professor Lindley exposes the philosophical cum 
practical reasons for the composition of this book. 
The point about Section 3.10 and the integration 
over the sample space was missed in our analysis but 
is indeed crucial in its link with the likelihood prin- 
ciple that does not appear per se in Theory of Proba- 
bility. Nowadays, this is certainly the most standard 
example that illustrates how the principle for con- 
structing Jeffreys's priors may violate the likelihood 
principle (Berger and Wolpert, 1988). (The opposi- 
tion with deFinetti's perspective is also worth notic- 
ing, since they approached Bayesian statistics from 
fundamentally different perspectives, even though 
their respective books share the same title.) 

The fact that uncertainty must be analyzed in 
probabilistic terms is certainly a driving force in 
Theory of Probability and a convincing reason to 
follow Bayesian ways. We completely agree that this 
formalization is one of Harold Jeffreys's great inputs. 
Once again, the other fundamental input stressed 
both by Professor Lindley and ourselves is the com- 
plete formalization of a coherent approach to testing 
via Bayes factors. Professor Lindley is 100% correct 
in his assessment of the opposition of this view with 
Popper's and of its persistence (Templeton, 2008): 
rejecting a model based on its "falsity" is only feasi- 
ble when considering the available alternatives. That 
Theory of Probability does not directly dwell on de- 
cisions is clearly a feature of the time, even though 
Keynes had opened the way a few years earlier, but 
this did not prevent a formalization of Bayesian test- 
ing procedures that proved itself compatible with 
"0-1" loss functions, thus showing the insight in 
Harold Jeffreys's intuitions. 
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5. ON SENN'S COMMENTS 

Given the tone of some earlier comments of Pro- 
fessor Senn on the Bayesian paradigm, we must ac- 
knowledge our pleasant surprise at his conciliatory 
tone in these comments. Thankfully, the barbed par- 
ody of Harold Jeffreys's most quoted sentence some- 
how re-establishes the balance! We are quite grate- 
ful to Professor Senn for his laudatory remarks, even 
though we must acknowledge that our copies of 
Harold Jeffreys's Theory of Probability are also full 
of pencil annotations and question marks, and also 
that it took two series of lectures to achieve this 
incomplete state of awareness. We furthermore en- 
joyed the mention that Harold Jeffreys considered 
Bayesian significance tests as the most important 
part of Theory of Probability, since this agrees with 
both Professor Lindley's and our perceptions. 

We dearly appreciate the further historical de- 
tails provided by Professor Senn's comments, partic- 
ularly in that the exchange between Ronald Fisher 
and Harold Jeffreys is represented in much less a 
controversial tone that we could have believed! [The 
first author also commented on Berger, Bernardo 
and Sun (2009) about the particular matter of the 
Law of Succession and so we do not need to re- 
peat the comments here.] Similarly, the confusion 
about Bernoulli shows how amateurish is our at- 
tempt at Science History. We are equally grateful to 
Professor Senn for pointing out Bartlett's connec- 
tion, as we must confess we were not even aware of 
it! When reading Bartlett's comments, we came to 
realize his contribution to the exclusion of improper 
priors for Bayes factors, as analyzed in deeper de- 
tails by Bickel and Ghosh (1990). 

6. ON ZELLNER'S COMMENTS 

Unsurprisingly, Professor Zellner's comments — that 
he delivered quite enthusiastically during his lecture 
at O-Bayes 2009 — are opening new vistas on The- 
ory of Probability, while differing from our analysis 
on several points. The first issue is that Theory of 
Probability was aimed at scientists at large, while 
we read it as statisticians. This is unavoidable, given 
our background and, further, we doubt many non- 
statisticians would have the time and the will to go 
through Theory of Probability. Unfortunately, most 
of them seem to eschew modern Bayesian introduc- 
tions to the benefit of shorter reviews published in 
their own discipline. We completely agree with Pro- 
fessor Zellner that we failed to understand the his- 
torical undercurrents explaining the connection of 



Theory of Probability with the philosophy of science 
at the time it was written. This is not to say we 
missed the global impact of Theory of Probability on 
scientific modeling and its definition of induction, a 
point already stressed by Professor Kass, because it 
obviously represents the major impact of the book, 
but the style of the discussions about the axiomatic 
nature of probability and our lack of background in 
this area led us to bypass them to focus on the link 
with modern Bayesian statistics. (Neither does the 
"proof" of Bayes' theorem strike us as ultimately 
necessary, once the axiomatic definition of probabil- 
ity is agreed upon.) The coherence of the system for 
scientific induction presented in Theory of Probabil- 
ity is what struck us the most in Theory of Proba- 
bility, even though we presumably skimmed too fast 
over this point. 

As already noted (with a different twist) in the dis- 
cussion about Professor Gelman's comments, there 
is no end to the debate about non-informative pri- 
ors and, while Professor Zellner's maximal data in- 
formation prior is an interesting alternative to Jef- 
freys's, Laplace's and Haldane's solutions, there is 
no reason to believe the community as a whole will 
eventually agree upon this point. We obviously ap- 
preciate the derivation of this prior based on a spe- 
cific information criterion developed by Professor 
Zellner. In a historical perspective, it may well be 
that the notions of "objective" or "noninformative" 
are not appropriate for the (Statistics of the) mid- 
1980s. 

The conclusion presented by Professor Zellner re- 
produces Seymour Geisser's assessment of Theory 
of Probability, for which we are both grateful and in 
complete agreement. 

7. CONCLUSION 

We are most grateful to the contributors for their 
lively discussions, which illustrate how influential 
Jeffreys's ideas still are today. Maybe the most strik- 
ing aspect in Theory of Probability is Harold Jef- 
freys's intuition that a completely coherent system 
could be designed for Bayesian analysis, a system 
upon which we are still building today. 
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